Knowledge Discovery in Literature Data Bases

نویسندگان

  • Rudolf Albrecht
  • Dieter Merkl
چکیده

The concept of knowledge discovery as defined through “establishing previously unknown and unsuspected relations of features in a data base” is, cum grano salis, relatively easy to implement for data bases containing numerical data. Increasingly we find at our disposal data bases containing scientific literature. Computer assisted detection of unknown relations of features in such data bases would be extremely valuable and would lead to new scientific insights. However, the current representation of scientific knowledge in such data bases is not conducive to computer processing. Any correlation of features still has to be done by the human reader, a process which is plagued by ineffectiveness and incompleteness. On the other hand we note that considerable progress is being made in an area where reading all available material is totally prohibitive: the World Wide Web. Robots and Web crawlers mine the Web continuously and construct data bases which allow the identification of pages of interest in near real time. An obvious step is to categorize and classify the documents in the text data base. This can be used to identify papers worth reading, or which are of unexpected cross-relevance. We show the results of first experiments using unsupervised classification based on neural networks. 1. The Astronomical Research Process The research process has only recently been defined in epistemological terms: the model developed by Sir Karl Popper (1972) comes closest to what most natural scientists do when they “do science”. Briefly, the research process starts with the input of signals, either through sensory perception, or through measuring devices which register signals which Astrophysics Division, Space Science Department, European Space Agency

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tools for knowledge acquisition within the NeuroScholar system and their application to anatomical tract-tracing data

BACKGROUND Knowledge bases that summarize the published literature provide useful online references for specific areas of systems-level biology that are not otherwise supported by large-scale databases. In the field of neuroanatomy, groups of small focused teams have constructed medium size knowledge bases to summarize the literature describing tract-tracing experiments in several species. Desp...

متن کامل

Augmenting Medical Databases with Domain Knowledge

In this paper we discuss a method of linking databases with domain knowledge to provide an extended semantics for use with statistical machine learning and automated discovery programs We focus on the use of data in conjunction with domain knowledge for automated discovery in medical databases and show how an induction program can nd new knowledge in a database by reasoning about classes and re...

متن کامل

An Empirical Study of Combined Classifiers for Knowledge Discovery on Medical Data Bases

This paper compares the accuracy of combined classifiers in medical data bases to the same knowledge discovery techniques applied to generic data bases. Specifically, we apply Bagging and Boosting methods for 16 medical and 16 generic data bases and compare the accuracy results with a more traditional approach (C4.5 algorithm). Bagging and Boosting methods are applied using different numbers of...

متن کامل

Knowledge Discovery in Environmental Data Bases using GESCONDA

In this work, last results of the research project “Development of an Intelligent Data Analysis System for Knowledge Management in Environmental Data Bases (DB)” are presented. The project is focussed on the design and development of a prototype for Knowledge Discovery (KD) and intelligent data analysis, and specially oriented to environmental DB. It is remarkable the high quantity of informati...

متن کامل

A structural informatics approach to mine kinase knowledge bases.

In this paper, we describe a combination of structural informatics approaches developed to mine data extracted from existing structure knowledge bases (Protein Data Bank and the GVK database) with a focus on kinase ATP-binding site data. In contrast to existing systems that retrieve and analyze protein structures, our techniques are centered on a database of ligand-bound geometries in relation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998